Rate Distortion Analyses and Bounds on Speech Codec Performance
نویسندگان
چکیده
We develop new rate distortion bounds for narrowband and wideband speech coding based on composite source models for speech and perceptual PESQ-MOS/WPESQ distortion measures. It is shown that these new rate distortion bounds do in fact lower bound the performance of important standardized speech codecs, including, G.726, G.727, AMR-NB, G.729, G.718, G.722, G.722.1, and AMR-WB. The approach is to calculate rate distortion bounds for MSE distortion measures using the classic eigenvalue decomposition and reverse water-filling approach for each of the subsource modes of the composite source model, and then use conditional rate distortion theory to calculate the overall rate distortion function for the composite speech source. Mapping functions are developed to map the rate distortion functions based on MSE to rate distortion curves subject to the perceptually meaningful distortion measures PESQ-MOS and WPESQ. These final rate distortion functions are then compared to the performance of the best known standardized speech codecs based on the code-excited linear prediction paradigm. An analysis of the tightness of these bounds indicates how the performance of existing voice codecs might be improved.
منابع مشابه
Rate Distortion Functions and Rate Distortion Function Lower Bounds for Real-World Sources
Although Shannon introduced the concept of a rate distortion function in 1948, only in the last decade has the methodology for developing rate distortion function lower bounds for real-world sources been established. However, these recent results have not been fully exploited due to some confusion about how these new rate distortion bounds, once they are obtained, should be interpreted and shou...
متن کاملRapid CODEC adaptation for cellular phone speech recognition
Along with the ever increasing popularity of cellular phones, improving recognition accuracy in cellular phone speech has become an issue of growing concern. However, the distortion caused by current low-bit rate speech CODEC is nonlinear, so compensating for distortion by applying only a conventional CMN which assumes distortion is a stationary linear transfer on the cepstrum domain is di cult...
متن کاملOptimum Source Codec Design in Coded Systems and Its Application for Low-Bit-Rate Speech Transmission∗
A generalized algorithm for designing an optimum VQ source codec in systems with channel coding is presented. Based on an AWGN channel model, the algorithm derives the distribution of the channel decoder soft-output and substitutes it in the expression for the system end-to-end distortion. The VQ encoder/decoder pair is then optimized by minimizing this end-to-end distortion. For a Gauss-Markov...
متن کاملEvaluation of the effect of the GSM full rate codec on the automatic detection of laryngeal pathologies based on cepstral analysis
Advances in speech signal analysis during the last decade have allowed the development of automatic algorithms for a non-invasive detection of laryngeal pathologies. Bearing in mind the extension of these automatic methods to remote diagnosis scenarios, this paper analyzes the performance of a pathology detector based on Mel Frequency Cepstral Coefficients when the speech signal has undergone t...
متن کاملInstantaneous-distortion based weighted acoustic modeling for robust recognition of coded speech
In this paper we apply the Weighted Acoustic Modeling (WAM) technique to the recognition of speech coded by the full-rate GSM codec or the FS-1016 CELP codec employing various estimates of instantaneous distortion. In the WAM method, separate hidden Markov models are developed for regions of speech that exhibit low levels of codec-induced distortion and for regions with higher levels of such di...
متن کامل